17 research outputs found

    A Column Generation Approach for Pure Parsimony Haplotyping

    Get PDF

    A branch-and-price approach for Pure Parsimony haplotyping

    Get PDF
    This thesis comes as the result of a detailed study of decomposition methods for large-scale problems and their application to a particular problem arising in computational biology. The improvements on computer capabilities and programming techniques in the last decades have widened the set of problems that can be easily solved as Mixed Integer Linear programs. However, several applications still require formulations that involve a non-tractable amount of data necessary to describe the geometry of the solution space. In these cases, decomposition methods are used to reduce the size of the problems to be addressed. In this thesis we propose the application of some of these methods, as Dantzig-Wolfe reformulation, column generation and Lagrangian relaxation, to a problem related to the study of the human genome. The human DNA is made of two double chains, each of which consists in a sequence of nucleotides. Among these, the ones related to the Single Nucleotide Polymorphisms (SNPs) are interesting as they describe the differences between individuals. We define a haplotype as a sequence of nucleotides that describes a portion of the SNPs found in a particular chromosome, and a genotype as the sequence that aggregates the information on SNPs coming from the double DNA chain of an individual. The problem we address falls into the class defining the Haplotyping Inference problem, that consists in recovering the structure of the haplotypes, given the information on the genotypes. In particular, we consider the parsimony criterion, which means that we want to find the minimum number of haplotypes able to explain all the genotypes. This problem is known to be APX-hard. There are several contributions in the literature that can be divided into two main different classes of mixed integer linear formulations. The first one presents a polynomial number of both variables and constraints, thus these formulations are solved using a branch-and-cut approach. The second class consists of formulations that present an exponential number of constraints and variables, solved with a branch-and-cut-and-price approach. The scope of this thesis is to investigate how a new formulation that involves an exponential number of variables and a polynomial number of constraints can be solved by a branch-and-price approach. Its aim is to provide a competitive algorithm with respect to other formulations from the literature, in particular those with a polynomial number of constraints and variables. We start by providing a review of the state of the art on the Haplotype Inference problem, with particular focus on the Mixed Integer Linear programming approaches for the Haplotype Inference by Pure Parsimony (HIPP) problem. We then consider a new mathematical programming formulation for HIPP that includes a set of quadratic constraints. By applying Dantzig-Wolfe reformulation, we obtained a new integer linear programming formulation, presenting an exponential number of variables and a polynomial number of constraints on the input data. This model is the basis for the development of a branch-and-price approach. Due to the large number of variables involved, a column-generation approach is needed to solve the linear relaxation at a generic node of the search tree. An initial feasible solution is easily found by means of heuristics and used as starting point to build the Restricted Master Problem (RMP). In order to find variables to be added to the RMP, we solve a dedicated subproblem, the pricing problem, that in our case presents a quadratic objective function. We propose different ways of solving the pricing problem. Among the exact methods, we consider the integer linear model obtained by linearizing the quadratic objective function and a Smart Enumeration approach, that partitions the set of feasible solutions and solves the pricing problem restricted to each subset, exploiting some extra available information to further reduce the size of the subproblems. As heuristic approaches, we at first note that the pricing problem is easily solved for particular haplotypes. Then, for investigating the remaining solutions we propose a local search-based heuristic and an Early-terminated Smart Enumeration, where we stop the Smart Enumeration approach as soon as we find a variable that can be added to the RMP. The oscillatory behaviour of the dual variables involved in the definition of the pricing problem is limited by introducing a stabilization technique adapted to our formulation. In particular, we extended the proof of convergence of this procedure, that consists in using dual values obtained as convex combinations between real dual variables and a chosen stability center, to the cases in which the stabilized dual variables are feasible for the dual problem. In order to solve the integer model, the solution of the linear relaxation is embedded in a branch-and-price approach. The branching rule we present is inspired to the well-known Ryan-Foster branching rule for set-partitioning problems. The correctness of our approach has been proved. Further observations on the similarity of the formulation's constraints to multiple set-covering ones suggest that we can relax a family of constraints to obtain a new formulation similar to a multiple set-covering. However, we note that the proposed branch-and-price algorithm applied to this formulation does not provide a feasible solution for HIPP, thus we need to integrate the proposed branching rule and recover a feasible optimal solution for HIPP. This branch-and-price approach has been implemented in C++, with the aid of SCIP libraries and Cplex solver. Results have been obtained from different classes of instances found in literature, coming from real biological data and generated using ad-hoc programs, as well as newly generated ones. The branch-and-price approach proposed for our formulation proves to be competitive with state-of-the-art polynomial-sized formulations. In fact, we can note how the linear relaxation of our formulation is tighter than other linear relaxations and provides an effective starting solution for the branch-and-price algorithm. Results show how our approach is efficient, in particular on the set of instances that contain a larger number of genotypes We proved therefore that a branch-and-price procedure provides a good solution approach for a formulation with exponential number of variables and polynomial number of constraints. Further work may include enhancements on the implementation details, such as exploring different ways of ordering the genotypes or combining heuristic and exact methods in the stabilized framework to solve the pricing problem. Moreover, it is possible to investigate the generalization of the proposed approach in order to solve set-partitioning problems

    How to Organize an OR Workshop: the AIROYoung Experience

    Get PDF
    AbstractAIROYoung is the Young Researchers Chapter of AIRO (Italian Association of Operations Research). In this work, we focus on the AIROYoung Workshop, an annual event which has been reuniting young researchers constituting this community from 2017. Describing all the editions of the initiative, we highlight the main guidelines we have been following so far to organize it (e.g., no registration fee, grants for free accommodation, no parallel sessions, and a look to sustainability). We analyze surveys filled by participants to present qualitative and quantitative data about our workshops and to show how interest and attendance have increased through the years. We talk about some innovations we have introduced to improve quality (e.g., the PhD school "from young researchers to young researchers" and the pitch talk session). We then share our personal opinion and, last but not least, we invite young researchers to join in the preparation of the next AIROYoung Workshops

    The Tick Formulation for deadlock detection and avoidance in railways traffic control

    Get PDF
    Wrong dispatching decisions may lead to deadlocks, where trains reciprocally block resources necessary to reach their destinations. It is crucial to develop tools to detect such potential deadlocks on time, in order to reverse the decisions previously taken by dispatchers or to take recovery actions. In this paper we present a new 0,1 linear formulation for detecting deadlocks and optimally park the involved trains to reduce congestion around the affected area. We discuss computational results on some realistic randomly generated instances to show the validity of the approach, as well as its limits.acceptedVersio

    Easy Cases of Deadlock Detection in Train Scheduling

    Get PDF
    A deadlock occurs when two or more trains are preventing each other from moving forward by occupying the required tracks. Deadlocks are rare but pernicious events in railroad operations and, in most cases, are caused by human errors. Recovering is a time-consuming and costly operation, producing large delays and often requiring crew rescheduling and complex switching moves. In practice, most deadlocks involve only two long trains missing their last potential meet location. In this paper, we prove that, for any network configuration, the identification of two-train deadlocks can be performed in polynomial time. This is the first exact polynomial algorithm for such a practically relevant combinatorial problem. We also develop a pseudo-polynomial but efficient oracle that allows real-time early detection and prevention of any (potential) two-train deadlock in the Union Pacific (a U.S. class 1 rail company) railroad network. A deadlock prevention module based on the work in this paper will be put in place at Union Pacific to prevent all deadlocks of this kind.acceptedVersio

    Strengthened Formulations and Valid Inequalities for Single Delay Management in Public Transportation

    Get PDF
    International audienceThe Delay Management Problem arises in Public Transportation networks, often characterized by the necessity of connections between different vehicles. The attractiveness of Public Transportation networks is strongly related to the reliability of connections, which can be missed when delays or other unpredictable events occur. Given a single initial delay at one node of the network, the Delay Management Problem is to determine which vehicles have to wait for the delayed ones, with the aim of minimizing the dissatisfaction of the passengers. In this paper, we present strengthened mixed integer linear programming formulations and new families of valid inequalities. The implementation of branch-and-cut methods and tests on a benchmark of instances taken from real networks show the potential of the proposed formulations and cuts

    Incorporating Stakeholders’ priorities and preferences in 4D trajectory optimization

    Get PDF
    A key feature of trajectory based operations (TBO) -- a new concept developed to modernize the air traffic system -- is the inclusion of preferences and priorities of the air traffic management (ATM) stakeholders. In this paper, we present a new mathematical model to optimize flights' 4D-trajectories. This is a multi-objective binary integer programming (IP) model, which assigns a 4D-trajectory to each flight, while explicitly modeling priorities and highlighting the trade off involved with the Airspace Users (AUs) preferences. The scope of the model (to be used at pre-tactical level) is the computation of optimal 4D pre-departure trajectory for each flight to be shared or negotiated with other stakeholders and subsequently managed throughout the flight. These trajectories are obtained by minimising the deviation (delay and re-routing) from the original preferred 4D-trajectories as well as minimizing the air navigation service (ANS) charges subject to the constraints of the system. Computational results for the model are presented, which show that the proposed model has the ability to identify trade-offs between the objectives of the stakeholders of the ATM system under the TBO concept. This can therefore provide the ATM stakeholders with useful decision tools to choose a trajectory for each flight

    A branch-and-price approach for Pure Parsimony haplotyping

    Get PDF
    This thesis comes as the result of a detailed study of decomposition methods for large-scale problems and their application to a particular problem arising in computational biology. The improvements on computer capabilities and programming techniques in the last decades have widened the set of problems that can be easily solved as Mixed Integer Linear programs. However, several applications still require formulations that involve a non-tractable amount of data necessary to describe the geometry of the solution space. In these cases, decomposition methods are used to reduce the size of the problems to be addressed. In this thesis we propose the application of some of these methods, as Dantzig-Wolfe reformulation, column generation and Lagrangian relaxation, to a problem related to the study of the human genome. The human DNA is made of two double chains, each of which consists in a sequence of nucleotides. Among these, the ones related to the Single Nucleotide Polymorphisms (SNPs) are interesting as they describe the differences between individuals. We define a haplotype as a sequence of nucleotides that describes a portion of the SNPs found in a particular chromosome, and a genotype as the sequence that aggregates the information on SNPs coming from the double DNA chain of an individual. The problem we address falls into the class defining the Haplotyping Inference problem, that consists in recovering the structure of the haplotypes, given the information on the genotypes. In particular, we consider the parsimony criterion, which means that we want to find the minimum number of haplotypes able to explain all the genotypes. This problem is known to be APX-hard. There are several contributions in the literature that can be divided into two main different classes of mixed integer linear formulations. The first one presents a polynomial number of both variables and constraints, thus these formulations are solved using a branch-and-cut approach. The second class consists of formulations that present an exponential number of constraints and variables, solved with a branch-and-cut-and-price approach. The scope of this thesis is to investigate how a new formulation that involves an exponential number of variables and a polynomial number of constraints can be solved by a branch-and-price approach. Its aim is to provide a competitive algorithm with respect to other formulations from the literature, in particular those with a polynomial number of constraints and variables. We start by providing a review of the state of the art on the Haplotype Inference problem, with particular focus on the Mixed Integer Linear programming approaches for the Haplotype Inference by Pure Parsimony (HIPP) problem. We then consider a new mathematical programming formulation for HIPP that includes a set of quadratic constraints. By applying Dantzig-Wolfe reformulation, we obtained a new integer linear programming formulation, presenting an exponential number of variables and a polynomial number of constraints on the input data. This model is the basis for the development of a branch-and-price approach. Due to the large number of variables involved, a column-generation approach is needed to solve the linear relaxation at a generic node of the search tree. An initial feasible solution is easily found by means of heuristics and used as starting point to build the Restricted Master Problem (RMP). In order to find variables to be added to the RMP, we solve a dedicated subproblem, the pricing problem, that in our case presents a quadratic objective function. We propose different ways of solving the pricing problem. Among the exact methods, we consider the integer linear model obtained by linearizing the quadratic objective function and a Smart Enumeration approach, that partitions the set of feasible solutions and solves the pricing problem restricted to each subset, exploiting some extra available information to further reduce the size of the subproblems. As heuristic approaches, we at first note that the pricing problem is easily solved for particular haplotypes. Then, for investigating the remaining solutions we propose a local search-based heuristic and an Early-terminated Smart Enumeration, where we stop the Smart Enumeration approach as soon as we find a variable that can be added to the RMP. The oscillatory behaviour of the dual variables involved in the definition of the pricing problem is limited by introducing a stabilization technique adapted to our formulation. In particular, we extended the proof of convergence of this procedure, that consists in using dual values obtained as convex combinations between real dual variables and a chosen stability center, to the cases in which the stabilized dual variables are feasible for the dual problem. In order to solve the integer model, the solution of the linear relaxation is embedded in a branch-and-price approach. The branching rule we present is inspired to the well-known Ryan-Foster branching rule for set-partitioning problems. The correctness of our approach has been proved. Further observations on the similarity of the formulation's constraints to multiple set-covering ones suggest that we can relax a family of constraints to obtain a new formulation similar to a multiple set-covering. However, we note that the proposed branch-and-price algorithm applied to this formulation does not provide a feasible solution for HIPP, thus we need to integrate the proposed branching rule and recover a feasible optimal solution for HIPP. This branch-and-price approach has been implemented in C++, with the aid of SCIP libraries and Cplex solver. Results have been obtained from different classes of instances found in literature, coming from real biological data and generated using ad-hoc programs, as well as newly generated ones. The branch-and-price approach proposed for our formulation proves to be competitive with state-of-the-art polynomial-sized formulations. In fact, we can note how the linear relaxation of our formulation is tighter than other linear relaxations and provides an effective starting solution for the branch-and-price algorithm. Results show how our approach is efficient, in particular on the set of instances that contain a larger number of genotypes We proved therefore that a branch-and-price procedure provides a good solution approach for a formulation with exponential number of variables and polynomial number of constraints. Further work may include enhancements on the implementation details, such as exploring different ways of ordering the genotypes or combining heuristic and exact methods in the stabilized framework to solve the pricing problem. Moreover, it is possible to investigate the generalization of the proposed approach in order to solve set-partitioning problems.Questa tesi rappresenta il risultato dello studio dettagliato di metodi di decomposizione per problemi di grandi dimensioni e la loro applicazione ad un problema particolare di biologia computazionale. I progressi nelle prestazioni dei computer e nelle tecniche di programmazione degli ultimi decenni hanno ampliato l’insieme di problemi che possono essere facilmente risolti come programmi lineari interi. Tuttavia, molte applicazioni richiedono ancora formulazioni che interessano una quantità di informazioni non trattabili per descrivere la geometria dello spazio delle soluzioni. In questi casi, vengono usati metodi di decomposizione per ridurre la dimensione del problema. In questa tesi proponiamo l’applicazione di alcuni di questi metodi, come la riformulazione di Dantzig-Wolfe, la generazione di colonne e il rilassamento Lagrangiano, ad un problema legato allo studio del genoma umano. Il DNA umano è costituito da due doppie catene formate da sequenze di nucleotidi. Tra questi nucleotidi, siamo interessati a quelli che individuano Single Nucleotide Polymorphisms (SNPs), in quanto descrivono le differenze tra individui della stessa specie. Definiamo un aplotipo come una sequenza di nucleotidi che descrivono una parte degli SNPs di un particolare cromosoma, e un genotipo come una sequenza che aggrega le informazioni sugli SNPs per una doppia catena di DNA. Il problema oggetto di studio ricade nella più ampia famiglia di problemi riguardanti l’Haplotype Inference, che consiste nel ricavare la struttura degli aplotipi avendo solo l’informazione riguardante i genotipi di una popolazione. In particolare, consideriamo il criterio di Pure Parsimony, per il quale cerchiamo il numero minimo di aplotipi necessari a formare tutti i genotipi dati. Questo problema è APX-hard. I diversi approcci usati per la soluzione di questo problema possono essere divisi in due principali categorie: la prima utilizza formulazioni con un numero polinomiale di variabili e vincoli, che sono risolte applicando un algoritmo branch-and-cut; la seconda classe presenta formulazioni con numero esponenziale di variabili e vincoli, risolte con un algoritmo di tipo branch-and-cut-and-price. Questa tesi si prefigge lo scopo di studiare come risolvere con un approccio di tipo branch-and-price una nuova formulazione che presenta un numero esponenziale di variabili e polinomiale di vincoli, ottenendo un algoritmo competitivo rispetto a quelli proposti in letteratura. Inizialmente, proponiamo un excursus nello stato dell’arte per il problema di Haplotype Inference, con particolare attenzione agli approcci di programmazione lineare intera per il problema di Haplotype Inference by Pure Parsimony (HIPP) problem. Successivamente consideriamo una nuova formulazione per HIPP che comprende un insieme di vincoli quadratici. Applicando la riformulazione di Dantzig-Wolfe, otteniamo una nuova formulazione lineare intera, con un numero esponenziale di variabili e polinomiale di vincoli. Questo modello viene usato per implementare un algoritmo di tipo branch-and-price. A causa del grande numero di variabili coinvolte, abbiamo bisogno di un algoritmo di generazione di colonne per risolvere il rilassamento lineare ad un qualsiasi nodo dell’albero di branching. Una soluzione iniziale ammissibile si trova facilmente utilizzando delle euristiche, e viene usata per costruire il Restricted Master Problem (RMP). Il pricing problem risultante è quadratico. Proponiamo diversi modi per risolvere il pricing problem. Tra i metodi esatti, consideriamo il modello lineare ottenuto linearizzando la funzione obiettivo quadratica e un algoritmo Smart Enumeration, che partiziona l’insieme di tutte le possibili soluzioni e risolve il pricing problem ristretto ad ogni singolo sottoinsieme. Per quanto riguarda le euristiche, proponiamo una local-search e una Early-terminated Smart Enumeration, in cui fermiamo la Smart Enumeration non appena troviamo una variabile da aggiungere al RMP. L’andamento oscillatorio delle variabili duali coinvolte nella definizione del pricing problem viene limitato stabilizzandole: consideriamo come coefficienti per il pricing problem una combinazione convessa tra i veri valori delle variabili duali e uno stability center scelto. Abbiamo esteso la dimostrazione di convergenza di questa procedura al caso in cui le variabili duali stabilizzate risultano essere ammissibili per il problema duale. Per risolvere il problema all’interezza, usiamo la soluzione del rilassamento lineare come punto di partenza per un algoritmo branch-and-price. La regola di branching proposta, e dimostrata corretta, si ispira alla famosa regola di Ryan-Foster per problemi di tipo set-partitioning. Ulteriori osservazioni sulla similitudine tra I vincoli della formulazione e un set-covering multiplo suggerisce di rilassare la formulazione. Tuttavia, l’algoritmo branch-and-price proposto finora potrebbe non restituire una soluzione ammissibile per HIPP, quindi bisogna ampliare la regola di branching proposta per recuperare la soluzione ottima. Abbiamo implementato l’algoritmo branch-and-price proposto in C++, con l’aiuto delle librerie SCIP e il solutore Cplex, e l’abbiamo testato su diverse classi di istanze usate in letteratura, sia simulate che derivanti da dati reali, e istanze create ad hoc. L’algoritmo proposto per la nostra formulazione risulta essere competitivo con i metodi di soluzione di formulazioni polinomiali dello stato dell’arte. Infatti, notiamo che la distanza tra il rilassamento lineare e l’inviluppo convesso è minore per la nostra formulazione rispetto alle altre formulazioni. I risultati computazionali mostrano l’efficienza del nostro algoritmo, in particolare sull’insieme di istanze che presentano un numero maggiore di genotipi. Abbiamo quindi mostrato come una procedura di tipo branch-and-price fornisce un buon metodo per risolvere problemi la cui formulazione presenta un numero esponenziale di variabili e un numero polinomiale di vincoli. Futuri sviluppi possono riguardare miglioramentii nei dettagli implementativi, ad esempio considerando diversi ordinamenti per i genotipi o combinando metodi esatti e altre euristiche per la risoluzione del pricing problem. Inoltre, si potrebbe generalizzare il metodo proposto per la risoluzione di problemi di tipo set-partitioning

    A Column Generation Approach for Pure Parsimony Haplotyping

    No full text
    International audienceno abstrac
    corecore